Diffusing Gaussian Mixtures for Generating Categorical Data
نویسندگان
چکیده
Learning a categorical distribution comes with its own set of challenges. A successful approach taken by state-of-the-art works is to cast the problem in continuous domain take advantage impressive performance generative models for data. Amongst them are recently emerging diffusion probabilistic models, which have observed generating high-quality samples. Recent advances focused on log likelihood improvements. In this work, we propose model data based focus sample generation, and sampled-based evaluation methods. The efficacy our method stems from performing while having parameterization informed structure nature target distribution. Our highlights capabilities limitations different data, includes experiments synthetic real-world protein datasets.
منابع مشابه
Infinite mixtures for multi-relational categorical data
Large relational datasets are prevalent in many fields. We propose an unsupervised component model for relational data, i.e., for heterogeneous collections of categorical co-occurrences. The co-occurrences can be dyadic or n-adic, and over the same or different categorical variables. Graphs are a special case, as collections of dyadic cooccurrences (edges) over a set of vertices. The model is s...
متن کاملRandom Forests for Generating Partially Synthetic, Categorical Data
Several national statistical agencies are now releasing partially synthetic, public use microdata. These comprise the units in the original database with sensitive or identifying values replaced with values simulated from statistical models. Specifying synthesis models can be daunting in databases that includemany variables of diverse types. These variablesmay be related inways that can be diff...
متن کاملLatent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data
Sample code Thickness Unif. Cell Size Unif. Cell Shape Marginal Adhesion Epithelial Cell Size Bare Nuclei Bland Chromatin Normal Nucleoli Mitoses Class 1000025 5 1 1 1 2 1 3 1 1 Benign 1002945 5 4 4 5 7 10 3 2 1 Benign 1015425 3 1 1 1 2 2 3 1 1 Benign 1016277 6 8 8 1 3 4 3 7 1 Benign 1017023 4 1 1 3 2 1 3 1 1 Benign 1017122 8 10 10 8 7 10 9 7 1 Malignant 1018099 1 1 1 1 2 10 3 1 1 Benign 101856...
متن کاملEstimation of latent Gaussian ARMA models for categorical behaviour data
We consider the tting of latent Gaussian models to categorical time series of cow feeding data. We derive a spectral quasi-likelihood for the data, and compare it with least squares ts to autocorrelations and MCMC estimators of the parameters in thresholded ARMA processes. We show that the spectral method is more e cient than least squares and far faster than MCMC.
متن کاملOverlapping Mixtures of Gaussian Processes for the data association problem
In this work we introduce a mixture of GPs to address the data association problem, i.e. to label a group of observations according to the sources that generated them. Unlike several previously proposed GP mixtures, the novel mixture has the distinct characteristic of using no gating function to determine the association of samples and mixture components. Instead, all the GPs in the mixture are...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2023
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v37i8.26145